7 research outputs found
SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling
Pre-training robot policies with a rich set of skills can substantially
accelerate the learning of downstream tasks. Prior works have defined
pre-training tasks via natural language instructions, but doing so requires
tedious human annotation of hundreds of thousands of instructions. Thus, we
propose SPRINT, a scalable offline policy pre-training approach which
substantially reduces the human effort needed for pre-training a diverse set of
skills. Our method uses two core ideas to automatically expand a base set of
pre-training tasks: instruction relabeling via large language models and
cross-trajectory skill chaining through offline reinforcement learning. As a
result, SPRINT pre-training equips robots with a much richer repertoire of
skills. Experimental results in a household simulator and on a real robot
kitchen manipulation task show that SPRINT leads to substantially faster
learning of new long-horizon tasks than previous pre-training approaches.
Website at https://clvrai.com/sprint.Comment: 29 pages, 18 figures. Published at ICRA 202
PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection
Large-scale data is an essential component of machine learning as
demonstrated in recent advances in natural language processing and computer
vision research. However, collecting large-scale robotic data is much more
expensive and slower as each operator can control only a single robot at a
time. To make this costly data collection process efficient and scalable, we
propose Policy Assisted TeleOperation (PATO), a system which automates part of
the demonstration collection process using a learned assistive policy. PATO
autonomously executes repetitive behaviors in data collection and asks for
human input only when it is uncertain about which subtask or behavior to
execute. We conduct teleoperation user studies both with a real robot and a
simulated robot fleet and demonstrate that our assisted teleoperation system
reduces human operators' mental load while improving data collection
efficiency. Further, it enables a single operator to control multiple robots in
parallel, which is a first step towards scalable robotic data collection. For
code and video results, see https://clvrai.com/patoComment: Website: https://clvrai.com/pat
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
In this work, we present a scalable reinforcement learning method for
training multi-task policies from large offline datasets that can leverage both
human demonstrations and autonomously collected data. Our method uses a
Transformer to provide a scalable representation for Q-functions trained via
offline temporal difference backups. We therefore refer to the method as
Q-Transformer. By discretizing each action dimension and representing the
Q-value of each action dimension as separate tokens, we can apply effective
high-capacity sequence modeling techniques for Q-learning. We present several
design decisions that enable good performance with offline RL training, and
show that Q-Transformer outperforms prior offline RL algorithms and imitation
learning techniques on a large diverse real-world robotic manipulation task
suite. The project's website and videos can be found at
https://q-transformer.github.ioComment: See website at https://q-transformer.github.i